Skip to content

Conversation

@MasterPtato
Copy link
Contributor

No description provided.

@vercel
Copy link

vercel bot commented Nov 14, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
rivetkit-serverless Error Error Nov 18, 2025 11:53pm
3 Skipped Deployments
Project Deployment Preview Comments Updated (UTC)
rivet-cloud Ignored Ignored Nov 18, 2025 11:53pm
rivet-inspector Ignored Ignored Preview Nov 18, 2025 11:53pm
rivet-site Ignored Ignored Preview Nov 18, 2025 11:53pm

@vercel vercel bot temporarily deployed to Preview – rivetkit-serverless November 14, 2025 01:40 Inactive
Copy link
Contributor Author

MasterPtato commented Nov 14, 2025

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more


How to use the Graphite Merge Queue

Add the label merge-queue to this PR to add it to the merge queue.

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

@claude
Copy link

claude bot commented Nov 14, 2025

PR Review: fix(udb): update backoff algo

Summary

This PR updates the transaction retry backoff algorithm to more closely align with FoundationDB's behavior. Overall, the changes look good and represent a significant improvement.

Code Quality: ✅ Good

Strengths:

  • Excellent documentation with concrete examples showing the backoff progression
  • References to FoundationDB source code provide good context
  • Clean, readable implementation

Suggestions:

  1. Documentation clarity: The comment at line 45 says "Jitter is 0-10% of backoff ms" but this is slightly misleading:

    • At attempt 0: jitter is 0% (base=1, so jitter is always 0)
    • At attempt 1+: jitter is 0-10% as stated

    Consider updating the comment to: // Jitter is 0-10% of backoff ms (proportional to base)

Correctness: ✅ Verified

Algorithm Analysis:

Old algorithm:

  • Max attempt capped at 10
  • Max backoff: 2^10 * 10 = 10,240ms
  • Fixed jitter: 0-99ms (not proportional)

New algorithm:

  • Max attempt capped at 7 ✅
  • Max backoff: 2^7 * 10 = 1,280ms ✅ (aligns with FDB's 1s max mentioned in docs)
  • Proportional jitter: 0-10% ✅

The math checks out correctly:

  • base = 2^min(attempt, 7) produces: 1, 2, 4, 8, 16, 32, 64, 128
  • base_backoff_ms = base * 10 produces: 10, 20, 40, 80, 160, 320, 640, 1280ms
  • jitter_ms = rand() % base produces proportional jitter

Potential Issues: ⚠️ Minor

Edge case - Attempt 0:
At attempt 0, base = 1, so jitter_ms = rand::random::<u64>() % 1 will always be 0. This is fine functionally, but means the first retry has no jitter. This could lead to thundering herd issues if many transactions fail simultaneously on their first attempt.

Recommendation: Consider starting jitter from attempt 0. For example:

// Option 1: Always add at least 1ms to the modulo
let jitter_ms = rand::random::<u64>() % (base + 1);

// Option 2: Add a minimum jitter amount
let jitter_ms = rand::random::<u64>() % base.max(10);

However, this may be intentional to match FDB's behavior (which appears to have no jitter based on the referenced code), so this is more of a consideration than a required fix.

Performance: ✅ Excellent

The new algorithm is actually better for performance:

  • Faster max backoff (1.28s vs 10.24s) means transactions retry more aggressively
  • Proportional jitter is more appropriate than fixed jitter for preventing thundering herds
  • Better alignment with FoundationDB's proven retry strategy

Security: ✅ No concerns

The use of rand::random() for jitter is appropriate for this use case (not cryptographic).

Test Coverage: ✅ Adequate

The function is used in:

  1. postgres/database.rs - production usage
  2. rocksdb/database.rs - production usage
  3. tests/rocksdb.rs - test usage (though this test has a duplicate implementation of the old algorithm at lines 140-146)

Note: The test file at tests/rocksdb.rs:140-146 has a duplicate implementation of the old algorithm. This should probably be removed since the test should use the shared implementation from utils::calculate_tx_retry_backoff.

Overall Assessment: ✅ Approve with minor suggestions

This is a solid improvement that brings the retry logic closer to FoundationDB's proven approach. The main suggestions are:

  1. Consider adding jitter for attempt 0 to prevent thundering herd (optional, depends on FDB alignment goals)
  2. Clean up the duplicate implementation in tests/rocksdb.rs
  3. Minor documentation clarification about jitter calculation

Great work on the detailed documentation and FDB source references!

@MasterPtato MasterPtato force-pushed the 11-13-fix_udb_update_backoff_algo branch from 9b19169 to a0dfaaf Compare November 14, 2025 02:40
@MasterPtato MasterPtato force-pushed the 11-12-chore_write_update_docs branch from 52bea63 to ce30dce Compare November 14, 2025 02:40
@vercel vercel bot temporarily deployed to Preview – rivetkit-serverless November 14, 2025 02:40 Inactive
@claude
Copy link

claude bot commented Nov 14, 2025

Code Review: fix(udb): update backoff algo

Summary

This PR updates the transaction retry backoff algorithm in UniversalDB to better align with FoundationDB's behavior. Overall, this is a good improvement that makes the backoff more aggressive and consistent with FDB's design.

✅ Positive Changes

  1. Better alignment with FDB behavior: The new algorithm caps at attempt 7 (max ~1.28s) instead of attempt 10 (max ~10.24s), which is closer to FDB's 1-second max backoff.

  2. Proportional jitter: The jitter is now proportional to the backoff (0-10% of base backoff) rather than a fixed 0-100ms. This is more sensible for larger backoffs.

  3. Excellent documentation: The inline documentation comparing the implementation to FDB with specific links is very helpful.

🐛 Potential Issues

Critical: Modulo bias in jitter calculation

Location: engine/packages/universaldb/src/utils/mod.rs:46

let jitter_ms = rand::random::<u64>() % base;

Issue: Using modulo with rand::random::<u64>() introduces modulo bias. When u64::MAX + 1 is not evenly divisible by base, some values will appear more frequently than others. While the bias is small for small values of base (max 128 in this case), it's still technically incorrect.

Recommendation: Use a proper bounded random range:

use rand::Rng;
let jitter_ms = rand::thread_rng().gen_range(0..base);

This requires adding rand to dependencies if not already present with the rng feature.

Minor: Inconsistency in test file

Location: engine/packages/universaldb/tests/rocksdb.rs:140

The test file has its own copy of calculate_tx_retry_backoff that still uses the old algorithm (max attempt 10). This should be updated to use the exported function from the main library or at least be kept in sync:

// Instead of duplicating, import from the library
use universaldb::calculate_tx_retry_backoff;

📊 Behavioral Analysis

Old algorithm:

  • Attempt 0: 10ms + 0-100ms jitter = 10-110ms
  • Attempt 5: 320ms + 0-100ms jitter = 320-420ms
  • Attempt 10 (max): 10,240ms + 0-100ms jitter = 10.24-10.34s

New algorithm:

  • Attempt 0: 10ms + 0-1ms jitter = 10-11ms
  • Attempt 5: 320ms + 0-32ms jitter = 320-352ms
  • Attempt 7 (max): 1,280ms + 0-128ms jitter = 1.28-1.41s

The new algorithm is:

  • More aggressive (lower max backoff)
  • More consistent (proportional jitter)
  • Better aligned with FDB's 1-second max

🔒 Security Considerations

No security concerns identified.

🧪 Test Coverage

No new tests were added. Consider adding a unit test to verify:

  1. The backoff values for known attempt numbers
  2. That jitter is within expected bounds
  3. That the max cap (attempt 7) works correctly

Example test:

#[test]
fn test_backoff_algorithm() {
    // Test specific attempts
    assert_eq!(calculate_tx_retry_backoff(0), 10..=11);
    assert_eq!(calculate_tx_retry_backoff(1), 20..=22);
    
    // Test max cap
    let backoff_7 = calculate_tx_retry_backoff(7);
    let backoff_100 = calculate_tx_retry_backoff(100);
    // Both should have same base (capped at 7)
    assert!(backoff_7 >= 1280 && backoff_7 < 1408);
    assert!(backoff_100 >= 1280 && backoff_100 < 1408);
}

📝 Recommendations

Required:

  1. Fix the modulo bias in jitter calculation (use rand::thread_rng().gen_range(0..base))
  2. Update or remove the duplicate function in tests/rocksdb.rs:140

Optional:
3. Add unit tests for the backoff algorithm
4. Consider verifying the FDB source links are correct (I couldn't fetch them to verify)

Overall Assessment

Approved with suggested fixes

The core change is sound and improves the backoff algorithm. The modulo bias issue should be addressed before merging, and the test file inconsistency should be resolved.

@claude
Copy link

claude bot commented Nov 14, 2025

Code Review

Summary

This PR updates the exponential backoff algorithm for transaction retries in UniversalDB to better align with FoundationDB's implementation. The changes improve the retry behavior by capping the max backoff and adjusting the jitter calculation.

Positive Changes ✅

  1. Better max backoff cap: Changed from min(10) (max 10.24s) to min(7) (max 1.28s), which is closer to FDB's 1s max
  2. Proportional jitter: The jitter is now 0-10% of the backoff (was fixed 0-100ms), making it scale appropriately with the backoff duration
  3. Excellent documentation: The comments clearly explain the algorithm with examples and reference FDB source code

Issues Found 🔍

1. CRITICAL: Outdated duplicate function in tests

Location: engine/packages/universaldb/tests/rocksdb.rs:140-146

There's a duplicate calculate_tx_retry_backoff function in the test file that still uses the OLD algorithm:

pub fn calculate_tx_retry_backoff(attempt: usize) -> u64 {
    let base_backoff_ms = 2_u64.pow((attempt as u32).min(10)) * 10;
    let jitter_ms = rand::random::<u64>() % 100;
    base_backoff_ms + jitter_ms
}

Action required: This duplicate should be removed and the test should import from universaldb::calculate_tx_retry_backoff instead. Having two different implementations will cause confusion and inconsistent behavior.

2. Potential arithmetic issue in jitter calculation

Location: engine/packages/universaldb/src/utils/mod.rs:46

The jitter calculation rand::random::<u64>() % base has an issue:

  • For attempt 0: base = 1, so jitter is random % 1 = 0 (always 0, not 0-1ms as documented)
  • For attempt 1: base = 2, so jitter is random % 2 (0-1ms, not 0-2ms as documented)
  • For attempt 2: base = 4, so jitter is random % 4 (0-3ms, not 0-4ms as documented)

The comment says "0-10% of backoff ms" but the actual range is 0 to (base-1)ms, which is slightly less than 10% of base_backoff_ms.

Suggested fix:

// Jitter is 0-10% of backoff ms
let jitter_ms = rand::random::<u64>() % (base_backoff_ms / 10 + 1);

This would give:

  • Attempt 0: 0-1ms jitter (10ms backoff)
  • Attempt 1: 0-2ms jitter (20ms backoff)
  • Attempt 2: 0-4ms jitter (40ms backoff)
  • Attempt 7: 0-128ms jitter (1280ms backoff)

Which matches the documentation exactly.

3. Missing unit tests

There don't appear to be any unit tests specifically for calculate_tx_retry_backoff. Given that this is a critical function for retry behavior, it should have tests verifying:

  • The exponential growth curve
  • The max cap at attempt 7+
  • The jitter is within expected bounds
  • The values roughly match FDB's behavior

Suggested test:

#[test]
fn test_tx_retry_backoff() {
    // Test base values without jitter (by checking range)
    for attempt in 0..10 {
        let backoff = calculate_tx_retry_backoff(attempt);
        let expected_base = 2_u64.pow((attempt as u32).min(7)) * 10;
        let max_jitter = 2_u64.pow((attempt as u32).min(7));
        
        assert!(backoff >= expected_base);
        assert!(backoff < expected_base + max_jitter);
    }
    
    // Verify max cap
    assert!(calculate_tx_retry_backoff(7) < 1500); // Should be ~1280ms + jitter
    assert!(calculate_tx_retry_backoff(100) < 1500); // Should still be capped
}

Minor Suggestions 💡

  1. Consider using rand::thread_rng().gen_range() instead of rand::random::<u64>() % n to avoid modulo bias (though it's negligible for large u64 values)

  2. Document the dependency on FDB knobs: The comment references specific FDB source files. Consider noting that this should be kept in sync with FDB updates.

Security Considerations 🔒

No security concerns identified. The backoff algorithm is appropriate for preventing retry storms.

Performance Considerations ⚡

The changes improve performance by reducing max backoff from ~10s to ~1.28s, allowing faster recovery from transient conflicts.

Verdict

Good improvement to align with FDB behavior, but needs the test duplicate removed and the jitter calculation fixed to match the documentation. Consider adding unit tests for this critical function.


🤖 Generated with Claude Code

@claude
Copy link

claude bot commented Nov 17, 2025

PR Review: fix(udb): update backoff algo

Summary

This PR updates the exponential backoff algorithm for transaction retries in UniversalDB to more closely align with FoundationDB's implementation. The changes improve the retry behavior with better jitter and a more appropriate max backoff.

Positive Changes ✅

  1. Better max backoff cap: Changed from 2^10 = 1024 (attempt 10) to 2^7 = 128 (attempt 7), resulting in a max base backoff of 1280ms vs 10240ms. This is more reasonable and closer to FDB's 1s max.

  2. Proportional jitter: The new jitter calculation (rand::random::<u64>() % base) provides 0-10% jitter relative to the backoff time, which is more appropriate than the fixed 0-100ms range in the old implementation.

  3. Excellent documentation: The added docstring clearly explains the behavior with examples and references to the FDB source code.

Issues & Concerns 🚨

1. Critical Bug: Jitter Calculation is Incorrect (engine/packages/universaldb/src/utils/mod.rs:46)

let jitter_ms = rand::random::<u64>() % base;

The comment says "Jitter is 0-10% of backoff ms" but this actually provides 0-100% jitter of base, not 10%.

For attempt 0:

  • base = 2^0 = 1
  • jitter_ms = rand() % 1 → 0ms only
  • Result: 10ms + 0ms = 10ms

For attempt 1:

  • base = 2^1 = 2
  • jitter_ms = rand() % 2 → 0-1ms
  • Result: 20ms + 0-1ms = 20-21ms ❌ (should be 20-22ms per docstring)

For attempt 7:

  • base = 2^7 = 128
  • jitter_ms = rand() % 128 → 0-127ms
  • Result: 1280ms + 0-127ms = 1280-1407ms ❌ (should be 1280-1408ms per docstring)

Fix: To match the docstring (0-10% of backoff), it should be:

let jitter_ms = rand::random::<u64>() % (base + 1);

Or update the docstring to reflect actual behavior (0-100% of base, which is 0-10% of base_backoff_ms).

2. Stale Test Function (engine/packages/universaldb/tests/rocksdb.rs:140-146)

There's a duplicate implementation of calculate_tx_retry_backoff in the test file using the OLD algorithm. This should be removed and the function should be imported from universaldb::utils::calculate_tx_retry_backoff instead to ensure tests use the actual implementation.

3. Random Number Generation Concerns (engine/packages/universaldb/src/utils/mod.rs:46)

  • rand::random::<u64>() generates a full 64-bit random number, then takes modulo. For small values of base, this is inefficient.
  • Consider using rand::thread_rng().gen_range(0..base) for cleaner range generation, though this may require adding rand as a dependency with the std feature if not already available.

Minor Observations

  1. Performance: The new algorithm with lower max backoff (1280ms vs 10240ms) will result in faster retry cycles in high-contention scenarios. This is generally good, but ensure this aligns with your system's behavior under load.

  2. Consistency: Good that the docstring includes both your implementation and FDB's for comparison. This makes the differences clear.

Recommendations

Must Fix:

  1. Fix the jitter calculation to match the docstring or update the docstring
  2. Remove the duplicate test implementation and import from the actual module

Nice to Have:

  1. Consider using gen_range for cleaner random number generation
  2. Add a unit test for calculate_tx_retry_backoff to verify the backoff values match expectations

Test Coverage

The function is tested indirectly through the rocksdb tests, but would benefit from a dedicated unit test to verify the backoff progression matches the documented behavior.


Overall, this is a good improvement to the backoff algorithm, but the jitter calculation bug needs to be fixed before merging.

@claude
Copy link

claude bot commented Nov 17, 2025

PR Review: fix(udb): update backoff algo

Summary

This PR updates the transaction retry backoff algorithm in UniversalDB to better align with FoundationDB's behavior. The changes improve the retry strategy by capping attempts at 7 (resulting in max ~1.4s) and using proportional jitter.

Code Quality & Best Practices

Positive

  • Excellent documentation with clear examples showing backoff progression
  • Good references to FDB source code for context
  • Clean implementation with clear variable names

Issues Found

1. Critical: Inconsistent Implementation in Test File

engine/packages/universaldb/tests/rocksdb.rs:140-146 contains a duplicate copy of this function with the old implementation:

pub fn calculate_tx_retry_backoff(attempt: usize) -> u64 {
    let base_backoff_ms = 2_u64.pow((attempt as u32).min(10)) * 10;
    let jitter_ms = rand::random::<u64>() % 100;
    base_backoff_ms + jitter_ms
}

Impact: Tests are using the old backoff logic, making them inconsistent with production code.

Fix: Remove the duplicate function from the test file and import it from the main module instead:

use universaldb::calculate_tx_retry_backoff;

2. Bug: Incorrect Jitter Calculation

Line 46: let jitter_ms = rand::random::<u64>() % base;

Issue: The comment says "Jitter is 0-10% of backoff ms" but the code implements 0-base where base is 2^attempt.

Examples:

  • Attempt 0: base=1, so jitter is 0-1ms (matches docs: "0-1ms")
  • Attempt 1: base=2, so jitter is 0-2ms (matches docs: "0-2ms")
  • Attempt 7: base=128, so jitter is 0-128ms (matches docs: "0-128ms")

Wait, the code is actually correct - the jitter IS proportional to the base. However, the comment is misleading:

  • For attempt 7: base_backoff=1280ms, jitter=0-128ms = exactly 10%

Clarity Fix: Consider updating the comment to be more explicit:

// Jitter is 0-10% of backoff ms (ranges from 0 to base milliseconds)
let jitter_ms = rand::random::<u64>() % base;

Performance Considerations

Improved Performance

  • Max backoff reduced from 10,240ms (old: 2^10 * 10) to 1,280ms (new: 2^7 * 10)
  • Jitter reduced from fixed 0-100ms to proportional 0-10%, which is better for early retries
  • These changes align better with FDB's 1s max backoff mentioned in the docs

Potential Concern: The reduced max backoff might lead to more aggressive retries under high contention. However, this aligns with FDB's design, so it should be acceptable.

Security Concerns

No security issues identified. The use of rand::random() for jitter is appropriate for this use case (not cryptographic).

Test Coverage

Missing

  • No unit tests for the backoff calculation function
  • No tests verifying the backoff progression matches documented values
  • No tests for edge cases (attempt 0, attempt greater than 7, etc.)

Recommendation: Add tests like:

#[test]
fn test_backoff_progression() {
    // Test that backoff values are within expected ranges
    for attempt in 0..10 {
        let backoff = calculate_tx_retry_backoff(attempt);
        let expected_base = 2_u64.pow((attempt as u32).min(7)) * 10;
        let expected_max_jitter = 2_u64.pow((attempt as u32).min(7));
        
        assert!(backoff >= expected_base);
        assert!(backoff < expected_base + expected_max_jitter);
    }
}

#[test]
fn test_backoff_max_cap() {
    // Verify attempts beyond 7 use same base as attempt 7
    let backoff_7_base = calculate_tx_retry_backoff(7) / 10 * 10;
    let backoff_100_base = calculate_tx_retry_backoff(100) / 10 * 10;
    
    // Both should have same base (1280ms), only jitter differs
    assert_eq!(backoff_7_base, backoff_100_base);
}

Action Items

  1. Remove duplicate function from tests/rocksdb.rs and import from module
  2. Add unit tests for the backoff calculation
  3. Consider clarifying the jitter comment (optional)

Verdict

The core algorithm change is correct and improves alignment with FDB, but the duplicate function in tests needs to be addressed before merging.

@MasterPtato MasterPtato marked this pull request as ready for review November 18, 2025 23:43
@claude
Copy link

claude bot commented Nov 18, 2025

PR Review: fix(udb): update backoff algo

This PR updates the transaction retry backoff algorithm in UniversalDB to more closely align with FoundationDB implementation. Overall, this is a solid improvement with better documentation and cleaner code organization.

Strengths

  1. Improved Algorithm: The new backoff calculation is closer to FoundationDB behavior with reduced max exponent from 10 to 7 (max backoff ~1280ms vs previous ~10,240ms), smarter jitter calculation, and better documentation.

  2. Code Organization: Moving calculate_tx_retry_backoff from test code to utils/mod.rs is correct - it is a production utility used across multiple database drivers.

  3. Documentation: Excellent documentation comparing the implementation with FDB actual code with GitHub links.

  4. Minor Fixes: The formatting changes in api-peer and api-public kv_get.rs files improve readability.

Critical Issue: Potential Arithmetic Bug

In engine/packages/universaldb/src/utils/mod.rs:46, the jitter calculation has a mismatch between code and comment. The comment says "Jitter is 0-10% of backoff ms" but the code does rand::random::() % base which gives 0 to base-1, not 0-10% of base_backoff_ms.

Recommendation: Based on your examples in the docs showing "0 -> 10ms + 0-1ms jitter" where base=1, the code seems correct. Update the comment to match: "Jitter is 0 to (base-1) ms"

Other Observations

TypeScript Changes: Moving assert import to Node built-in is good. The TODO comment about renaming createTs to update_ts is helpful - consider creating a tracking issue.

Performance Impact: The new algorithm reduces maximum wait time from ~10s to ~1.3s which improves user experience and system responsiveness under contention.

Test Coverage: Well-tested in production code. Consider adding unit tests for calculate_tx_retry_backoff to verify backoff values and jitter bounds.

Recommendations

  1. [Critical] Clarify the jitter calculation - fix either the comment or the code
  2. Add unit tests for calculate_tx_retry_backoff
  3. Consider creating a GitHub issue for the TypeScript createTs rename

Summary

Well-thought-out improvement. The only blocker is clarifying the jitter calculation to ensure code and comments match. Once resolved, this should be good to merge.


Review generated with Claude Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants